Implementation Complexity of Bit Permutation Instructions
نویسنده
چکیده
Several bit permutation instructions, including GRP, OMFLIP, CROSS, and BFLY, have been proposed recently for efficiently performing arbitrary bit permutations. Previous work has shown that these instructions can accelerate a variety of applications such as block ciphers and sorting algorithms. In this paper, we compare the implementation complexity of these instructions in terms of delay. We use logical effort, a process technology independent method, to estimate the delay of the bit permutation functional units. Our results show that for 64-bit operations, the BFLY instruction is the fastest among these bit permutation instructions; the OMFLIP instruction is next; and the GRP instruction is the slowest.
منابع مشابه
Bit Permutation Instructions for Accelerating Software Cryptography
Permutation is widely used in cryptographic algorithms. However, it is not well-supported in existing instruction sets. In this paper, two instructions, PPERM3R and GRP, are proposed for efficient software implementation of arbitrary permutations. The PPERM3R instruction can be used for dynamically specified permutations; the GRP instruction can be used to do arbitrary n-bit permutations with u...
متن کاملSubword Sorting with Versatile Permutation Instructions
Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among registers. Bit-level permutation instructions have also been proposed recently for their importance in cryptography. However, some important algorithms, especially ones with lots of conditional control dependencies such...
متن کاملImplementation of Efficient Bit Permutation Box for Embedded Security
Security in every real time applications is of utmost importance. The secure architecture implemented in the automobiles such as EVITA (E-safety Vehicle Intrusion protected Application), SEVECOM (Secure Vehicle Communication) has rich cryptographic properties, but has more footprint area and high power consumption. This existing architecture uses standard engines like AES (Advanced encryption s...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کامل